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Abstract 

We study the effect of the non-Gaussian clustering of galaxies on the statistics of 
pencil beam surveys. We find that the higher order moments of the galaxy distribution 
play an important role in the probability distribution for the power spectrum peaks. 
Taking into account the observed values for the kurtosis of galaxy distribution we derive 
the general probability distribution for the power spectrum modes in non-Gaussian 
models and show that the probability to obtain 

the 128/i~^ Mpc periodicity found in pencil beam surveys is raised by roughly one 
order of magnitude. The non-Gaussianity of the galaxy distribution is however still 

insufficient to explain the reported peak-to-noise ratio of the periodicity, so that 
extra power on large scales seems required. 
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1 Introduction 



The surprising discovery of a 128h^^ Mpc periodicity in the distribution of galaxies (Broad- 
hurst et al. 1989) has raised an intense debate about the statistical significance of the signal 
detected. The main question is whether 

the periodicity is consistent with the local observations or is rather to be regarded as 
a new feature appearing only when very large scales (S> 100/i~^ Mpc) are probed. In the 
original paper by Broadhurst et al. (1989; BEKS) the statistical significance of the peak 
in the one-dimensional power spectrum was assessed making use of an external estimator, 
i.e. adopting a model for the clustering of galaxies. The clustering was assumed to be 
described by the usual correlation function ^(r) = {r/ro)~'^ up to the scale of 30h~^ Mpc, 
without any correlation beyond this scale, and without any higher order moment. As Szalay 
et al. (1991) pointed out, however, external estimators are very model dependent. Even 
slightly different assumptions, concerning e.g. selection functions or the parameters ro,7, 
can result in dramatic variations of statistical significances. Indeed, Kaiser & Peacock (1991), 
investigating 

essentially the same dataset as BEKS, found that the noise level was to be significatively 
raised, resulting in a much higher probability to find a peak as large as, or larger, the one at 
128/i^^ Mpc, so as to reconcile the standard model of galaxy clustering with the BEKS data. 
Similarly, Luo & Vishniac (1993) found that the redshift distortions can alter the estimate 
of the noise level. 

This seems to force one to use internal estimators of the noise level. This has been done 
by Szalay et al. (1991), who showed then that the probability to find a peak as high as 
the one in the BEKS data, or higher, is 2.2 ■ 10~^, matching their original estimate. Luo 
& Vishniac (1993) also confirmed the result that, while the rest of the power spectrum is 
consistent with the hypotheses of clustering and Gaussianity, the single prominent spike at 
128/1"^ Mpc is not. 

They also showed that even a delta-like feature in the tridimensional power spectrum of 
the galaxy distribution can barely account for the BEKS spike. As we will show below, the 
probability estimate on which these conclusions are based relies essentially on two hypotheses: 
a) that the spatial bins of the BEKS survey are uncorrelated, i.e. that the clustering beyond 
30h~^ Mpc is negligible, and b) that the components of the power spectrum can be assumed, 
by virtue of the central limit theorem, to be Gaussian distributed. The very fact that the 
probability estimate based on these two hypotheses is as low as 2.2 ■ lO^"' points to the 
conclusion that one of the two, or both, are false. This implies either that there is some 

previously unknown, and theoretically unexpected, feature in the 

tridimensional power spectrum at large scale, or that it is the other hypothesis, the 

Gaussianity, to be abandoned. 

The first possibility has been explored for instance in the Voronoi 

simulations (see e.g. Coles 1990, SubbaRao & Szalay 1992), or in truncated HDM models 
(Weiss & Buchert 1993). Unhke the precedent studies, in this paper we consider in detail 
the latter way out. 

The scheme of this paper is as follows. First, we derive the probability distribution of the 
components of a one-dimensional power spectrum in presence of higher order moments of 
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the spatial distribution. Second, we ask ourselves which is the probabihty to find a spike as 
high as, or higher than, the one in the BEKS data in such non-Gaussian galaxy distribution. 
Finally, adopting the actual higher order moments found in local (< 100h~^ Mpc) observa- 
tions, we will show that the formal probability for the BEKS periodicity increases roughly 
by an order of magnitude. This, however, may still be insufficient to explain the data. 

Let us note that we will not question in any way the reliability of the BEKS data or 
of their noise estimate. Rather, we derive our conclusion only taking into due account the 
already known level of non-Gaussianity in the galaxy distribution. 



2 Non-Gaussian pencil beam statistics 

The BEKS data consist in a set of counts along a survey geometry that approximates a long, 
thin cylinder directed towards the galactic poles. The galaxy positions are binned in small 
cylinders of radius R = Sh'^ Mpc and radial length SOh'^ Mpc, out to L/2 ~ lOOO/i"^ Mpc 
in both directions. The details of the survey are given in the original paper (BEKS) and in 
Szalay et al. (1991). Let us denote the cell counts as nj, with i — 1, ..N 67. The discrete 
Fourier transform of the dataset is 

1 ^ 

fk = p II%exp(i27r/crj/L) , (1) 

where rj — SOjh~^ Mpc is the radial distance to the j — th bin, and P — J^n-j is the 
total number of galaxies (396 in BEKS). The counts rij have mean n — P/N and variance 
0"^ =< {rij—fif' > as well as higher order irreducible moments (or cumulants, or disconnected 
moments) A;„. 

The power spectrum is defined as 

Ak^\U?. (2) 

Let us define the quantity 

ttk — . {6} 



Squaring Ofe we obtain al = ( Refk)^ /[a'^{N/2P'^)] . Likewise, we can define the quantity 
bk — J2j nj sin{2nkrj/ L)/a^ N/ 2 and form the modulus 

— 2 I J 2 -^k /.\ 

^ = ». + ^.- ,2(^/2n - (4) 

The problem is now to find out the probability distribution density (FDD) of A}- when 
we know the one for rij. First, however, we have to derive the FDD of Ofc and bk- They 
are constructed as a linear sum of independent variables (as long as the various nj are 
uncorrelated) , so by the central limit theorem a^, bk should tend to be Gaussian distributed. 
However, since A'^ ~ 70 is not really very large, one should check if the higher order terms 
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are significant. This is indeed what will be shown to happen. We make use of the so-called 
Edgeworth expansion (see e.g. Cramer 1966, Abramovicz & Stegun 1972, whose notation 
we will follow), according to which the variable 

(the sums run over terms) where the Yi are independent random variables with mean rrii 
, variance cTj and n-th order cumulants kn,i, is distributed like a function f{X) that can be 
expanded in 

powers of N~^^'^ 



f{X) ~ G(X) 
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1 + -^^HesiX) + -^HeJX) + J^He^iX) + OiN-^'^) 
gjYi/2 ' 2AN ^ ' 72N ''^ ^ ^ ^ 



(6) 



Here, G{X) is the normal distribution, ife„ is the Hermite polynomial of order n, and 
7i = (Xi hi/N)/ (Xi (^i/Nf , 72 = (Ei hi/N)/(X,i cti/NY . The Edgeworth expansion has 
been used recently in astrophysics by several authors to quantify slight deviations from 
Gaussianity (Juszkiewicz et al. 1993, and references therein). Now we can notice that, as 
long as the counts Uj are uncorrelated, the variables and are indeed in the form (|^), 
where Yi = rii cos{27i kr i / L) 

[or Yi = rii sin(27rA;rj/L) ] so we are allowed to 

apply the Edgeworth expansion. To the order 1/A^, the expansion will include the skew- 
ness and the kurtosis of the counts rii. 

However, the skewness sum ^si for the variables Yi vanishes due to the oscillating term, 
so that 7i = 0. 

Let us estimate then the expansion coefficient 72 in our case. The higher order moments 
in the galaxy counts have been calculated by several authors for different surveys (Saunders et 
al. 1991; Bouchet, Davis & Strauss 1992; Gaztanaga 1992; Loveday et al. 1992). The general 
result is that, for scales which range from some megaparsecs to more than 50 h'^ Mpc, the 
dimensionless cumulants /i^ = km/n^ (for m = 2 , 1x2 = cr^/n^) 

obey the hierarchical scaling relation 

where 5*^ are the scaling constants (we have checked that the shot-noise correction is negli- 
gible in our case). To the lowest order in the variance and for scales much larger than the 
correlation length of the fluctuation field, the scaling relation is in reality a direct consequence 
of the Edgeworth expansion (and can actually be derived by a much simpler argument, see 
Amendola & Borgani 1994). Then we see that 

72 ~ (3/2)^4/i2 . (8) 

where the numerical factor is due to the sum over the sines and cosines in J2i ^4,i- The 
relations between the scaling constants and the physics of the clustering process have been 
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investigated in several works, from the book of Peebles (1980) to recent generalizations as 
in Bernardeau (1992). 

The value of /X2 = cr^/n^ can be expressed as a function of 

the correlation function (e.g. Peebles 1980), o"^ = (n+n^^o), whereto = V'"^ I d'^rid^r2W {ri)W {r2)^{ri- 
r2) and where W is the window function corresponding to the BEKS cylindrical cells of vol- 
ume V. Since n ^ 6 and is of order unity, we can approximate /X2 with ^q, so that 

72 ~ (3/2)S'4^o- The value of 72 will result to be crucial. 

Several uncertainties, however, prevent its exact estimate. For ^0 we must rely on very 
local observations; we may assume the value given in Szalay et al. (1991), 

^0 ~ 0.83, or the one that we derive from Luo & Vishniac (1993), ^0 ~ 1.24, or similar 
values, depending on models of the correlation function. For 6*4 one problem is that we need 
the scaling constants for quite elongated cylindrical cells, while the observations have been 
carried out mostly for large spherical or cubic cells. We can find observational values from 
near unity to 30 or 40 

(see e.g. the table in Gaztaaga (1992)). Further, Lahav et al. (1993) find that and 
5*4, rather than being constants, sharply increase with the rms density contrast 5, and thus 
decrease with the cell volume, when 5 > 1 in CDM simulations. We then absorb the 
uncertainties of 5*4 and of ^0 in 72 and explore numerically the range 72 G (0 — 40). 

Let us come back to the Edgeworth PDD /(flfc) for ak (to the order 1/A^). The PDD for 
y = alis then P{y) = f{ak){dau/ dy) = f{y'/^)/2y^/\ that is 

P{y = al) ~ giPi + ^ - Q93P3 + 3(7iPi] = ^ c,P,{y), (9) 

i 

where (?„ = 2"/^r(?2/2) and Pn{y) = gn^y^^'^~^e~^^'^ is the PDD with n degrees of freedom. 

Now that we have the PDD for a^, 6^ we must find the distribution for z = a1 + b1. Let us 
denote with 0(t) the characteristic function (CF) of a generic probability distribution P{x), 
where (j){t) = J e^^^P{x)dx . The general theorems about probability distributions say that 
the CF of the sum of two variables is the product of the CF of the variables. Furthermore, 
by linearity, we see that the CF of P = Pi + P2 is 4>{Pi) + 0(P2)- We are to use these two 
properties to derive the general distribution P{Ak). First, we calculate the CF (f){P) for P{y) 
given by (|), 

P{y) = HiCiPi- Denoting the CF for the distribution P„ as ■j/'^i = (1 — 2zt)""/^, we 
have 

M = <P[alml] = m = (Z^^i^i)"^ (10) 

i 

where the sum runs over all the PDD in the 
expansion (|^), 

with the same q's. Now, since ipni^m = i^n+m, we can see that the CF for the unknown 
distribution 

P{z) is a sum of x^ CFs, so that the final result P{z) is again a sum of x^ PDDs. 
Before writing down the result, we note that z = Af:/[cr^{N/2P^)] = 2Ak/AQ, where Aq 
is the 
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noise level in the notation of BEKS. It follows Aq = a'^N/P'^ = {^o/N + 1/P), which 
gives an external estimate of the noise level. However, as already remarked, the estimate of 
Aq by Szalay et al. (1991) is internal in that is not based on a a priori model for ^(r), but 
rather on fitting the observational distribution function for Ak at small amplitudes with the 
exponential P{Ak) = (iMo) Gxp{Ak/Ao), as it should be in the purely Gaussian case (or for 
oo). The same internal estimate applies here, since as we will see the Gaussian and 
non-Gaussian PDD are equivalent at low amplitudes. 

Finally, the normalized distribution function for A^ to the order 1/A^ is 

Piz = 2A,Mo) = P2 + aiPe - 2P, + P^) (11) 

where a = 72 /4A^. 

Eq. (^TJ) gives then the general PDD for the power spectrum 
amplitudes relative to a set of pencil beam counts with scaling coefficient S^. 
When 5*4 = we return to the exponential distribution P{z) = P2 on which the calcula- 
tion of BEKS, and of all the other works on the subject, was based. We can see from P{z) 
why the higher order terms are important. Since the peak-to-noise ratio X = Ak/Ao found 
by BEKS is very large, Xbeks = 11-8, the terms containing higher order functions will 
dominate over the P2 term when integrated to give the cumulative probability, even if the 
constant a are small, i.e. even if N is large. Actually, for any given there is a value Zc 
such as the higher order terms dominate over the lower orders in the integral J^^°° P(z)dz . 
This is a consequence of the fact that, while the convergence of any distribution /(X) to the 
normal one for A^ — > cx3 is ensured by the central limit theorem, the convergence itself need 
not be uniform. The fractional difference between the cumulative distribution of f{X) and 
the one relative to a normal distribution can be arbitrarily large for large deviations from 
the mean. 

We can now directly compare the PDD (0) with the power spectrum coefficients found 
by BEKS. We use the tabulated values provided by Luo & Vishniac (1993), binned in peak- 
to-noise intervals of 0.5. We plot in Fig. 1 the cumulative function of the BEKS coefficients 

versus peak-to-noise ratio (a point at abscissa x represents the fraction of values of Ak in 
the BEKS data with peak-to-noise ratio larger than x) and compare this with our theoretical 
cumulative function 

r+oo 

F{X)= P{z)dz, (12) 

where X = Ak/Ao = z/2 is the peak-to-noise ratio. The functions plotted are for the 
Gaussian case {S4 = 0), and for three possible values of the constant 72: from bottom to 
top, 

72 = 5, 20, 40. It is clear that as 72 increases, the 

observed distribution becomes more and more consistent with the non-Gaussian behavior, 
except for the last point, the 128h~^ Mpc spike, which 

appears still far away from its expected frequency value. However, we can estimate now 
the probability to have Xbeks = 11-8 or higher in one of the ~ 30 fc-bins to which BEKS 
assigned the data (see Szalay et al. 1991 for a detailed exposition) and compare with 



5 



the very unlikely value 2.2 • 10 ^ originally found for 72 = 0. The non-Gaussian result is 

P(> 11.8) ^ 30F(11.8) = 0.001 - 0.005 , (13) 

for the range 72 = 10 — 40. 

The inclusion of non-Gaussianity pushed the probability to obtain the BEKS spike by 
about one order of magnitudes, without any need to invoke non standard features in the 
galaxy distribution. The result (|TB|) states that the BEKS spike should occur roughly in 0.1- 
0.5% of the cases if the very large scale galaxy distribution has to be consistent with the local 
observations of variance and kurtosis. If further data do not reduce the peak significativity, 
our result 

indicate that is very difficult for the non-Gaussianity alone to explain the observations. 

In Fig. 2 we display the behavior of P{> 11.8) vs. 72 . Only for very large values of 72, 
and thence of 5*4 or of the BEKS spike approaches the 3a level. In other words, if the 
BEKS periodicity is strenghtened by further data, we will be forced to assume values of 5*4 
or ^0 larger than local observations would require, and/or to discard the assumption that 
the spatial bins are uncorrelated. On the other hand, a value of Xbeks smaller by even a 
ten percent would result in quite higher values of P(> 11.8), as shown by the dot-dashed 
curve in Fig. 2. 

3 Conclusions 

We have shown that the higher order moments of the galaxy clustering play a not negligible 
role in assessing the significance level of peaks in one-dimensional power spectra. The scaling 
constant 5*4 and the correlation average ^0 on the spatial bin in a pencil beam survey are 
combined in the crucial parameter 72. Assuming that the spatial bins are uncorrelated, the 
question raised by the remarkable periodicity discovered by Broadhurst et al. (1989) in the 
very large scale galaxy clustering can then be expressed in the following way: are the values 
of 5*4 and of ^0 

determined by local observations compatible with the clustering of galaxies at the very 
deep scales probed by the pencil beam? To give an answer, we have to determine the 
probability distribution for the power spectrum amplitudes of a non-Gaussian field sampled 
in spatial bins. We find by means of the Edgeworth expansion that the BEKS most prominent 
spike around 128/i~^ Mpc has a probability of roughly 1 — 5 ■ 10~^ for acceptable values of 
72, to be compared with the value 2.2 ■ 10""^ obtained by Szalay et al. (1991) neglecting 
the kurtosis correction. This result seems to show that non-Gaussianity alone cannot be 
responsible for the BEKS periodicity, unless further observations will allow very large values 
of 5*4 or ^0 or will decrease the peak-to-noise ratio of the 12%h~^ Mpc peak. As the data 
stand, we should conclude that the spatial bins cannot be assumed uncorrelated. 
Indeed, this is what the large coherent structures reported in deep surveys 
seems to require. We also compared the peak occurrences of the full spectrum of BEKS 
and found it in a good agreement with our non-Gaussian probability distribution if large 
values for 72 are allowed. This raises the 
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interesting possibility that further pencil beam data can be employed to measure the 
parameter 72, i.e. the product 5'4^0) down to very deep distances. 

Let us conclude by the remark that is not unlikely that further higher order terms in the 
Edgeworth expansion are significant. Further terms would require however the knowledge of 

new scaling coefficients, like Sq and so on, which are not available. Here we confined 
ourselves to the order 1/N for simplicity, with the aim to show how the non-Gaussian 
properties of the galaxy clustering have a strong effect on the peak probability estimate. In 
this sense, our calculation gives only a lower bound on the probability estimate. 
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Figure Caption 

Fig. 1. 

Cumulative probability distribution for the Gaussian model (dashed straight line), for the 
non-Gaussian models with 72 = 5,20,40 (solid lines, bottom to top), and for the BEKS 
data (filled squares). The vertical long-dashed line marks the peak-to- noise ratio of BEKS, 
X = 11.8. 

Fig. 2. 

Probability to find a peak as high as, or higher than, the 128/i~^ Mpc peak of BEKS as a 
function of the crucial parameter 72. The long-dashed curve is for a value of X ten percent 
lesser than Xbeks — 11.8. The horizontal dashed lines show the rejection levels of 99.7 %. 
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